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Integer implementation of signal processing, in particular signal transforms 



BACKGROUND OF THE INVENTION 

It is well known that common signal processing operations like FFTs, 
convolutions, DCT-transfonns, etc., often require large dynamical ranges for the variables 
employed in such algorithms. This leads to implementations using floating point arithmetic, 
5 rather than with fixed point arithmetic because the latter yield larger rounding noise at equal 
word-length. Let us first recall the distinction between integer or fixed-point arithmetic on the 
one hand and floating point arithmetic on the other. 

Our goal is to represent numbers in the memory or registers of a computer or 
digital circuit in the form of binary digits ('0's and c l 's). Because of their discrete nature we 
10 can only represent a finite set of numbers, all other numbers are "rounded" or "truncated" to 
one of the representable values, leading to quantization noise. For the sake of the argument 
let us focus on numbers between -1.0 and 1.0, and say lhat we have 16-bits available to 
represent numbers in this range. 

- for fixed point numbers, all representable numbers are of the form: n-2' ls , where n is an 
15 integer in the range [-2 1S ..,2 15 -1], The representable numbers are uniformly spaced. The 

dynamic range, which is the ratio of the largest to the smallest representable number is 2 15 
*»10 s . 

- for floating point numbers, all representable numbers are of the form s-(Q.S+ml2 a * 1 )*2* b , 
where m is an^-bit integer, and (0.5+»*/2*) is called the "mantissa" and obviously lies 

20 between 0.5 and 1. s is a 1-bit "sign", and e is called the "exponent", a (15-a>bit number, 
and b is the exponent-bias. As an example take a 7-bit mantissa and an 8-bit exponent. 
Then the range 0.5. . , 1 (set e=6) contains 128 representable numbers 1/256 apart. The 
range 0.2S...1 (set also contains 128 representable number 1/512 apart., etc. We 

see that the representable numbers are packed closer together, the closer we get to 0, in a 
25 logarithmic fashion. The exponent bias b sets the origin of this quaa-togarithmic scale. In 
this example the dynamic range is about 2 256 » 10 77 . 

Floating point numbers are a trade-off between a large dynamic range , and 
locally uniform distribution of representable numbers. This meshes nicely wife the idea that 
in many relevant computations we need to represent small numbers with a small granularity 
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and large numbers with a large granularity. Another way to say this: the floating point 
representation matches Hie "natural distribution of numbers", which is roughly logarithmic, 
muoh more closely, For that reason, in practice floating point calculations almost invariably 
lead to much more accurate results than fixed point calculations with words of the same size 

5 (number of bits). 

The major drawback of floating point numbers is that they require more 
complex hardware to perform additions, multiplication ete, E.g. for a floating point addition, 
both operands have to be normalized to the same exponent, followed by an ordinary addition, 
and a final re-scaling of the exponent. In software floating-point operations are therefore 

10 usually muoh slower. 

la the case of watermark detection, DSP operations like FFTs must happen 
accurately: a watermark is carefully hidden in the content -often in the LSBs- and so me 
signal processor must proceed with care so as not to lose it However for watermarking in 
copy-protection or tracing applications, cost is a major issue: it is not a feature which can 

15 warrant a higher price in the store. A manufacturer of watermark detectors has two choices to 
control the accuracy: 

- use a floating pomtjmplementation with relatively high hardware cpstXor high CPU-load 
for software) 

- use a fixed point implementation, but considering the statements about accuracy above, 
20 one is forced to use much longer words for an integer implementation man for a floating 

points. This also drives up the cost if many memory words are needed for storage, and . 
consequently a lot of memory bandwidth is needed. 

Applicant's International Patent Application WO 99/45707 discloses a 
watermark embedding system (hereinafter referred to as JAWS) to which the invention is 
25 particularly applicable. Fig. 1 shows the signal processing steps of this watermark detection. 
Since for algorithms like JAWS, memory was the largest cost-factor, up to now only the 
floating point implementation (with 17-point words: 8-bit mantissa, and an 8-bit exponent) is 
available. For instance, in the 2D FFT-step, if one wanted to use integers, one would need 
about 20... 24 bits (depending on the video-content) to get similar accuracy to the 17-bit 
30 floating-point implementation. 

From the literature there are many methods known which can help to reduce 

the word-length for integer FFTs e.g.: 

- insertion of guard-bits (shifting the input to the right by k bits, if the signal processing 
step is expected to increase the dynamic range by frbits). An FFT on points 
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increases the dynamio range by N (worst case), so the required insertion of n guard bits is 
like pre-dividlng the input by a factor of N. 

- using block floating points. Blo<& floating points ^ - 
which represent a different range of numbers (like £-1 . I] ? or [ -y 4 „ . K] or [-8. . .8]) at 
different stages in the processing step, depending on the required dynamic range. For 
instance in an PFT one would choose a new range for the 1 6«bit variables to represent, 
after every one of the n stages. 

Although these methods are helpfbl, in general they still cause too much 

quantization noise to allow e.g. robust watermark detection. 



OBJECT AMD SUMMARY OF THE INVENTION 

It is one of the purposes of this invention disclosure to show how it is possible 
to mse fixed-point implementation to do signal processing with words which are no longer 
than the floating point implementation. 
1 5 These and other objects are achieved by a method and arrangement as claimed 

in the appended claims. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a block diagram indicating the signal processing steps involved 
20 to perform a watermark detection using the JAWS system. 

Fig. 2 shows pictures illustrating the operation of the JAWS watermark ; 
detection system which is shown in Fig. 1 . 

Fig. 3 shows the power spectrum density of the output of the 2D FFT without 
(left) and with (right) pre-filtering in accordance with the invention, 
25 Fig. 4 shows a diagram illustrating watermark detection reliability without and 

with pre-filtering in accordance with the invention. 

Fig. 5 shows a block diagram of the JAWS watermark detection architecture 
including the pre-filter in accordance with the invention. 

Fig, 6 shows a block diagram illustrating an algorithm for performing cyclical 
30 correlations using pre-filtars to compress the dynamic range of the input and a post-filter to 
undo the effects of the pre-filter. 
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DESCRIPTION OF EMBODIMENTS 

Fox many applications the statistics of the input-signal to the signal processing 
step is well known. For instance in tJie oase of the watermark detector shown in Fig. I , the 
output of the IDCT (input of the 2D FFT) contains essentially the original video-content 
5 which is strongly peaked in the low horizontal and vertical frequencies. This is illustrated in 
Fig. 2, where the left picture shows a typical input to the 2D FFT (here 1 second of folded 
video), and the right picture shows an intensity plot of 1he spectrum at the output of the 2D 
FFT, With a dynamic range of ahout 21 bits. The O-freq. is at the center of the picture. The 
horizontal and vertical DC-frequencies cause the FFT to overflow. Preventing overflow by 
10 re-scaling yields even more quantization noise. This quantization noise ultimately obscures 
the watermark. 

According to this invention it is preferred to first apply a (high-pass) filter to 
the input of the FFT which suppresses the low frequencies with large amplitude, thus 
reducing the required dynamic range. In fact the filter should be chosen such that it 

1 5 emphasizes those frequencies that contain most of the watermark energy, and cause least 
quantization noise in that energy range. As an example we have filtered the fold buffer (the 

leftpictttre_mFig.2)wto^ 

' 1 -2 1 ' 
-2 4 -2 
1 -2 1 



before inputting into, the 2D FFT. 

20 The result thereof is shown in Fig. 3. In this Figure, the left diagram shows the 

two-dimensional power density spectrum of the output of the 2D FFT without pre-filteiing. 
The vertical axis is in powers of 2. It is clear that about 21 bits of dynamic range are required 
to represent the bulk of the coefficients. The right diagram shows the output of the 2D FFT 
with pre-filtering, using the filter mentioned above. The required dynamio range has been 

25 reduced to about 8-btts. The 2D spectra have been projected onto the plane given by x+y=0. 

It is dear that the pre-filtering step causes a major decrease in required 
dynamic range to represent the (intermediate) result. In fact in combination with use of 16-bit 
block-floating point variables throughout the rest of the algcttifhm, the peak-reliability is 
almost indistinguishable from a 32-bit floating point implementation. Fig. 4 shows the 

30 watermark detection retiability for the same video sequence, processed once with a 32-bit 
floating point detector (solid line) and once with a 16-bit integer detector (dashed line). 
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Because of the noxmaKzation step (SPOMPs phase-extraction right after the 
2D FFT), the effect of the input filter is purged Li other words: although with infinite 
precision variables the pre-filter has no effect (its effects arc removed by SPOMF), it does 
reduce quantization noise significantly for a fixed-point implementation. 
5 Fig, 5 shows a block diagram of the JAWS watermark detection architecture 

including the pre-filter in accordance with the invention to reduce dynamic-range / overflow / 
quantization noise problems in the 2D PFT. The advantages of pre-filtering for JAWS are: 

- cheaper integer-only hardware / smaller CPU-load for integer-only software 

- allows implementation on integer-only DSP (the vast bulk of DSP in use today, and 

1 0 certainly the only DSPs available in the low-end market; floating-point DSP are still very 
expensive), 

- reduces memory consumption for JAWS detection by a factor 16/1 Sx 

- reduces memory bandwidth requirements significantly. Some implementors will not 
embed the 72 kbytes of RAM needed by the JAWS system on board of a chip but rather 

15 re-use 72Kbytes of an external DRAM memory chip to (like from a 512 kb drive-buffer 
in a DVD-ROM drive). Since the memory databus usually has a width of 16-bi^ or 32 

tits, an 18-bit transfer requires about (32/1 8)x-more data to move across the bus than ~_ 

necessary. This is particularly problematic in high-speed (l<5x) DVD-ROM drives, where 
memory bandwidth is at a premium. 

20 Although the invention emerged out of research in the field of JAWS 

watemiarking, it will he appreciated that the method is general enough to benefit other 
watermarking methods, or indeed signal processing in general. As an example, let us 
compute the (cyclical) correlation between the 2 patterns, A and J} 9 by performing a 
multiplication in the frequency domain, as shown in Fig. 6. Again we may employ 

25 appropriately chosen pre-filters F and G, adapted to the statistical behavior of pattern A and S 
to control quantization noise in the FFTs. The effeot of the filter (which scales the frequency 
components) is undone by the post-filter. If, as often is the case, pattern B is fixed, the post- 
filtering step may be combined with the pre-filtering and FFT of pattern jB, 

As another example, consider the fingerprinting technique described in the 

30 paper "Robust Audio Hashing for Content Identification," by Jaap Haitsma, Ton Kallcer and 
Job Oostveen, presented at the Content-Based Multimedia Indexing conference 2001, 
Brescia, Italy. la this technique a ''fingerprint" of an audio signal is generated by splitting its 
power spectrum in a series of frequency bands and coding differences between these bands, 
both with respect to frequency and with respect to time, in a small number of bits. The 
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fingerprint thus obtained is robust against a wide range of signal distortions, such as MP3 
compression, noise addition, all-pass filtering, etc. Typically, the frequency bands considered 
cover the interval from 300 Hz to 3 kHz. 

The power spectrum is obtained by applying a FFT to the downsampled and 
5 windowed input signal. As long as a floating-point algorithm is used this is just fine. 
However, quite often the power spectrum contains a peak near DC, which is substantially 
higher than the values in the frequency range of interest This results in excessive 
quantization noise in that frequency range if an integer FFT with small dynamic range is 
used- Evidently, this may readily lead to spurious bit errors in the fingerprint, not caused by 
1 0 actual signal distortions, but caused by a deficiency of the implementation. The solution is to 
remove the DC peak by applying a high-pass filter to the input signal prior to performing the 
FFT, or alternatively, to apply a band-pass filter which only selects the frequencies of 
interest. 

The invention can also be described in the following manner. A digital signal 

1 5 processor operating with integer arithmetic circuits has a certain accuracy. Each processing 
step (multiplication, addition) increases the number of bits (the word length). For example, 
the Fast Fourier Transform having a butterfly structure requires a pl urality of such processing 
steps to be performed In practical implementations, the processing Steps are recursively 
performed by a single integer arithmetic circuit having a given word length, say N. After 

20 each step, the word length of the signal is reduced to the given word length N by rounding, 
truncation, or some other smart form of qmff^zzlion. An obvious way to prevent quantization 
errors is to scale down the input signal. However, this results in quantization errors to be 
already introduced in the input signal. For processes such as watermark detection this is fatal, 
since the least significant bits of the input signal constitute precisely the place where the 

25 watermark is embedded. 

In accordance with the invention, the signal is pre-processed by a pre- 
processor which reduces the word length and which is invariant with respect to the 
subsequent process. The expression "being invariant 7 ' means that if the pre-processor 
operated with infinite accuracy, it would have no effect on the subsequent process. If such a 

30 pre-processor operates with finite accuracy, it will reduce the quantization noise. The high- 
pass pre-fllter described above with reference to the JAWS watermark detection process 
fulfills this condition, because it is a zero-phase filter and the watermark to be detected is 
carried by the phase of Fourier coefficients. 
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The invention can be summarized as follows. For some signal processing 



applications it is necessary to drive the cost down by choosing an intege^only 
implementation, with as small as possible bit-depth. However,, often economical bit-depths 
like 16-bits yield inaccurate results because of excessive rounding noise. This holds in 
5 particular for FFTs like they are used in watermark-detection algorithms like JAWS, For this 
reason such algorithms are forced to stick to 18-bit floating point arithmetic causing higher 
silicon-cost, memory bandwidth, and CPU load. To deal with this problem we propose to 
pre-filter the input signal prior to manipulating it, to control the dynamic range during the 
signal processing steps. 



10 
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CLAIMS; 



1. A method of processing a signal using a digital signal processor having a 
given word length, the method comprising the step of pre-processing said signal using a pre- 
processor which reduces the word length and performs an operation which is invariant with 
respect to the process being performed by the digital signal processor., 

5 

2. A method as claimed in claim 1, wherein said process being performed by the 
digital signal processor is watermark detection, and the pre-processor is a high-pass filter. 

3. A method of processing a signal received in the form of signal samples having 
10 a range of sample values, the method comprising the steps of filtering the signal to reduce the 

range of signal sample values in a given band of non-interest, and digitally processing the 
filtered signal using integer arithmetic. 



4. A digital signal processor comprising: 

15 - input means for receiving a signal in the form of integer signal samples having a range of 
sample values; 

- filtering means to reduce the range of signal sample values in a band of non-interest; 

- a digital signal processing circuit for digitally processing the filtered signal using integer 
arithmetic. 

20 

5. A processor as claimed in claim 4, wherein said digital processing circuit 
comprises a transform circuit for transforming the signal into frequency coefficients. 



25 



6. A processor as claimed in claim 3, wherein said correlation circuit includes a 

Fourier transform circuit for computing said correlation for a plurality of shifts between said 
signal and said predetermined watermark pattern. 
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ABSTRACT; 



For some signal processing applications it is necessary to drive the cost down 
by choosing an integer-only implementation, with as small as possible bit-depth. However, 
often economical bit-depths like 16 -bits yield inaccurate results because of excessive 
rounding noise. This holds in particular for FFTs like they are used in watermark-detection 
5 algorithms like JAWS. For this reason such algorithms are forced to stick to 1 8-bit floating 
point arithmetic causing higher silicon-cost, memory bandwidth, and CPU load. To deal with 
this problem we propose to pre-JHter the input signal prior to manipulating it, to control the 
dynamic range dining the signal processing steps. 
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